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The Locator/ID Separation Protocol (LISP) limits the growth of the Default-Free Zone rout¬ 
ing tables by creating a highly aggregatable and quasi-static Internet core. However, LISP 
pushes the forwarding state to edge routers whose timely operation relies on caching of 
location to identity bindings. In this paper we develop an analytical model to study the 
asymptotic scalability of the LISP cache. Under the assumptions that (i) long-term popular¬ 
ity can be modeled as a constant Generalized Zipf distribution and (ii) temporal locality is 
predominantly determined by long-term popularity, we find that the scalability of the LISP 
cache is 0(1) with respect to the amount of prefixes (Internet growth) and users (growth of 
the LISP site). We validate the model and discuss the accuracy of our assumptions using 
several one-day-long packet traces. 


1 Introduction 

The growth of the Default-Free Zone (DFZ) routing 
tables [20] and associated churn observed in recent 
years has led to much debate as to whether the cur¬ 
rent Internet infrastructure is architecturally unable 
to scale. Sources of the problem were found to be 
partly organic, generated by the ongoing growth of 
the topology, but also related to operational prac¬ 
tices which seemed to be the main drivers behind 
prefix deaggregation within the Internet’s core. Di¬ 
verging opinions as to how the latter could be solved 
triggered a significant amount of research that finally 
materialized in several competing solutions (see [TH| 
and the references therein). 

In this paper we focus on location/identity sepa¬ 
ration type of approaches in general, and consider 
the Locator/ID Separation Protocol (LISP) |23| as 
their particular instantiation. LISP semantically de¬ 
couples identity from location, currently overloaded 
by IP addresses, by creating two separate names¬ 
paces that unambiguously address end-hosts (identi¬ 
fiers) and their Internet attachment points (locators). 
This new indirection level has the advantage that it 


supports the implementation of complex traffic en¬ 
gineering mechanisms but at the same time enables 
the locator space to remain quasi-static and highly 
aggregatable m- 

Although generally accepted that location/identity 
type of solutions alleviate the scalability limitations 
of the DFZ, they also push part of the forwarding 
complexity to the edge domains. On the one hand, 
they require mechanisms to register, distribute and 
retrieve bindings that link elements of the two new 
namespaces. On the other, LISP routers must store 
in use mappings to speed-up packet forwarding and to 
avoid generating floods of resolution requests. This 
then begs the question: does the newly introduced 
LISP edge cache scale? 

This paper provides an analytical answer by ana¬ 
lyzing the scalability of the LISP cache with respect 
to the growth of the Internet and growth of the LISP 
site. To this end we leverage the working-set the¬ 
ory m and previous results that characterize tem¬ 
poral locality of reference strings Hdl] to develop 
a model that relates the LISP cache size with the 
miss-rate. We find that the relation between cache- 
size and miss-rate only depends on the popularity 
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distribution of destination prefixes. Additionally, for 
a given miss rate, as long as the popularity follows 
a Generalized-Zipf distribution, the LISP cache size 
scales constantly 0(1) with respect to the growth of 
the Internet and the number users, if the last two 
do not influence the popularity distribution. If this 
does not hold then the cache scales linearly 0(N). To 
support our results, we also analyze the popularity 
distribution of destination prefixes in several one day 
real-world packet traces, from two different networks 
and spanning a period of 3.5 years. 

The rest of the paper is structured as follows. We 
provide a brief overview of LISP in Section In Sec¬ 
tion [3] we derive the cache model under a set of as¬ 
sumptions and thereafter discuss its predictions and 
implications for LISP. In Section we present em¬ 
pirical evidence that supports our assumptions and 
evaluate the model, while in Section we discuss the 
related work. Finally, we conclude the paper in Sec¬ 
tion [H 

2 LISP Background 

LISP [55] belongs to the family of proposals that im¬ 
plement a location/identity split in order to address 
the scalability concerns of the current Internet archi¬ 
tecture. The protocol specification has recently un¬ 
dergone IETF standardization |8|, however develop¬ 
ment and deployment efforts are still ongoing. They 
are supported by a sizable community spanning both 
academia and industry and rely for testing on a large 
experimental network, the LISP-beta network [1]. 

The goal of splitting location and identity is to in¬ 
sulate core network routing that should ideally only 
be aware of location information (locators), from the 
dynamics of edge networks, which should be con¬ 
cerned with the delivery of information based on iden¬ 
tity (identifiers). To facilitate the transition from the 
current infrastructure, LISP numbers both names¬ 
paces using the existing IP addressing scheme, thus 
ensuring that routing within both core and stub net¬ 
works stays unaltered. However, as locators and 
identifiers bear relevance only within their respective 
namespaces, a form of conversion from one to the 
other must be performed. LISP makes use of encap- 



Figure 1: Example packet exchange between 
ElD SRC and EIDrst with LISP. Poliowing intra¬ 
domain routing, packets reach xTRa which obtains a 
mapping binding EIDrst to RLOCri and RLOCb2 
from the mapping-system (steps 1-3). From the map¬ 
ping, xTRa chooses RLOCri as destination and 
then forwards towards it the encapsulated packets 
over the Internet’s core (step 4). xTRr decapsulates 
the packets and forwards them to their intended des¬ 
tination. 

sulation m and a directory service to perform such 
translation. 

Prior to forwarding a host generated packet, a 
LISP router maps the destination address, or End¬ 
point IDentifier (EID), to a corresponding destina¬ 
tion Routing LOCator (RLOC) by means of a LISP 
specific mapping system Eiliij. Once a mapping is 
obtained, the border router tunnels the packet from 
source edge to corresponding destination edge net¬ 
work by means of an encapsulation with a LISP- 
UDP-IP header. The outer IP header addresses are 
the RLOCs pertaining to the corresponding border 
routers (see Fig. [^. At the receiving router, the 
packet is decapsulated and forwarded to its intended 
destination. In LISP parlance, the source router, that 
performs the encapsulation, is called an Ingress Tun¬ 
nel Router (ITR) whereas the one performing the 
decapsulation is named the Egress Tunnel Router 
(ETR). One that performs both functions is referred 
to as an xTR. 

Since the packet throughput of an ITR is highly de¬ 
pendent on the time needed to obtain a mapping, but 
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also to avoid overloading the mapping-system, ITRs 
are provisioned with map-caches that store recently 
used EID-prefix-to-RLOC mappings. Stale entries 
are avoided with the help of timeouts, called time 
to live (TTL), that mappings carry as attributes. 
Whereas, consistency is ensured by proactive LISP 
mechanisms through which the xTR owner of an up¬ 
dated mapping informs its peers of the change. Intu¬ 
itively, the map-cache is most efficient in situations 
when destination EIDs present high temporal and/or 
spatial locality and its size depends on the diversity 
of the visited destinations. As a result, performance 
depends entirely on map-cache provisioned size, traf¬ 
fic characteristics and the eviction policy set in place. 


3 Cache Model 

We start this section by discussing some of the fun¬ 
damental properties of network traffic that may be 
exploited to gain a better understanding of cache per¬ 
formance. Then, assuming these properties are char¬ 
acteristic to real network traces we devise a cache 
model. Finally we analyze and discuss the predic¬ 
tions of the model. 


3.1 Sources of Temporal Locality in 
Network Traffic 

We consider the following formalization of traffic, ei¬ 
ther at Web page or packet level, throughout the 
rest of the paper. Let D be a set of objects (Web 
pages, destination IP-prefix, program page etc.). 
Then, we consider traffic to be a strings of references 
ri, r 2 ,..., Ti... where ri = o G D is a, reference at 
the Ah unit of time that has as destination, or re¬ 
quests, object o. Generally, we consider the length of 
the reference string to be N. Also, note that we use 
object and destination interchangeably. 

Two of the defining properties of reference strings, 
important in characterizing cache performance, are 
the heavy tailed popularity distribution of destina¬ 
tions and the temporal locality exhibited by the re¬ 
quests pattern. We discuss both in what follows. 


3.1.1 Popularity Distribution 

copious amounts of studies in fields varied as linguis¬ 
tics 123 [21] , Web traffic [21 HS] , video-on-demand [3] , 
p2p overlays |5| and flow level traffic [22] found the 
probability distribution of objects to have a positive 
skew. Generally, such distributions are coined Zipf- 
like, i.e., they follow a power law; whereby the prob¬ 
ability of reference is inversely proportional to the 
rank of an object. Generally, the relation is surmised 

as: v{k) = — where v is the frequency, or num- 

ber of requests observed for an object, k is the rank, 
n = 1/iJ(n, a) is a normalizing constant and H (n, a) 
is the generalized harmonic number. 

It is interesting to note that although Zipf’s law has 
its origins in linguistics, it was found to be a poor fit 
for the statistical behavior of words frequencies with 
low or mid-to-high values of the rank variable. That 
is, it does not fit the head and tail of the distribu¬ 
tion. Furthermore, it’s extension due to Mandelbrot 
(often called the Zipf-Mandelbrot law) only improves 
the fitting for the head of the distribution. Such dis¬ 
crepancies were also observed for Web based and p2p 
reference strings. Often the head of the distribution is 
flattened, i.e., frequency is less than the one predicted 
by the law, or the tail has an exponential cutoff or a 
faster power law decay mM- But these differences 
are usually dismissed on the basis of poor statistics in 
the high ranks region corresponding to objects with 
a very low frequency. 

Nevertheless, Montemurro solved recently the 
problem in linguistics by extending the Zipf- 
Mandelbrot law such that for high ranks the tail un¬ 
dergoes a crossover to an exponential or larger ex¬ 
ponent power-law decay. Surprisingly, he found this 
features, i.e. deviations from the Zipf-like behavior, 
to hold especially well when very large corpora [21] 
are considered. We further refer to this model as the 
Generalized Zipf law or GZipf and, in light of these 
observations, we assume the following: 

Assumption 1. The popularity distribution of des¬ 
tination IP-prefix reference strings can be approxi¬ 
mated by a GZipf distribution. 
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3.1.2 Temporal locality 

can be informally defined as the property that a re¬ 
cently referenced object has an increased probability 
of being re-referenced. One of the well established 
ways of measuring the degree of locality of reference 
strings is the inter-reference distance distribution. 

Breslau et al. found in [5] that strings gener¬ 
ated according to the Independent Reference Model 
(IRM), that is, assuming that references are inde¬ 
pendent and identically distributed random variables, 
from a popularity distribution have an inter-reference 
distribution similar to that of the original string. Ad¬ 
ditionally, they inferred that the probability of an 
object being re-referenced after t units of time is pro¬ 
portional to 1/t. Later, Jin and Bestavros proved 
that in fact temporal locality emerges from both long¬ 
term popularity and short-term correlations. How¬ 
ever, they found that the inter-reference distance dis¬ 
tribution is mainly induced through long-term popu¬ 
larity and therefore is insensitive to the latter. Ad¬ 
ditionally, they showed that by ignoring temporal 
correlations and assuming a Zipf-like popularity dis¬ 
tribution, an object’s re-reference probability after t 
units of time is proportional to These 

observations then lead to our second assumption: 

Assumption 2. Temporal locality in destination IP- 
prefix reference strings is mainly due to the prefix pop¬ 
ularity distribution. 

We contrast the two assumptions with the proper¬ 
ties of several packet-level traces in|^ In what follows 
we are interested in characterizing the inter-reference 
distribution of a GZipf distribution and further on 
the cache miss rate using the two statements as sup¬ 
port. 


3.2 GZipf generated inter-reference 
distribution 


then by integration obtain the average for the whole 
reference string, which we denote by f{t). 

If v is the normalized frequency, namely, the num¬ 
ber of reference to an object divided by the length 
of the reference string N, then, as shown in [21] the 
probability of observing objects with frequency n in 
the reference string is: 


p^{n) oc 


1 

fiiy^ -b (A - 


( 1 ) 


where 1 < r < g are the exponents that control the 
slope of the power laws in the two regimes and /i and 
A are two constants that control the frequency for 
which the tail undergoes the crossover. 

From Assumption!^ it follows that references to an 
object are independent whereby the inter-reference 
distance t is distributed exponentially with expected 
value of l/n. Then, if we denote by d(t, ly) the num¬ 
ber of times the inter-reference distance for an object 
with frequency n is t, we can write: 


dft,!^) ^ {vN — l)ve (2) 

If r'min and i^max are the minimum and respectively 
the maximum normalized frequency observed for the 
reference string, we can compute the inter-reference 
distance for the whole string as: 


fit) 



Puir') d{t, v)diy 
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(3) 


Unfortunately, the integral is unsolvable, neverthe¬ 
less, we can still characterize the properties of f{t) in 
the two regimes of the GZipf distribution. In the high 
frequency region, where term having q as exponent 
dominates the denominator we can write: 


In this section we compute the inter-reference dis¬ 
tance distribution for a GZipf popularity. The result 
is an extension of the one due to Jin and Bestavros 
for a Zipf-like popularity. As a first step we compute 
the inter-reference distribution for a single object and 
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where, r{n,z) = ^dx is the incomplete 

Gamma function, = (^/('^ ~ is the fre¬ 

quency for which the two terms that make up the 
denominator are equal. It is useful to note that for 
low t values that correspond to high frequencies the 
nominator presents a constant plateau that quickly 
decreases, or bends, at the edge as t ^ l/r'fe- There¬ 
fore, we can approximate: 

/.(*) ~ (5) 

Similarly, it may be shown that for low frequencies, 
that is, in the region where term with r as exponent 
dominates: 

m ~ ( 6 ) 

Finally, we conclude that the inter-reference dis¬ 
tance distribution can be approximated by a piece- 
wise power-law. Our result is similar to the single 
sloped power-law obtained by Jin under the assump¬ 
tion of Zipf distributed popularity or the empirical 
observations by Breslau et. al in [5] for Web ref¬ 
erence strings. However, due to its general form it 
should be able to capture the properties of more var¬ 
ied workloads. In the following section we use the 
inter-reference distance distribution together with 
the working-set theory to deduce the miss rate of an 
LRU cache. 

3.3 A Cache Model 

Denning proposed the use of the working-set as a 
tool to capture the set of pages a program must store 
(cache) in memory such that it may operate at a de¬ 
sired level of efficiency [5]. The idea is to estimate a 
program’s locality, or in-use pages, with the help of 
a sliding window of variable length looking into the 
past of the reference string. In their seminal work 
characterizing the properties of the working-set [7], 
Denning and Schwartz showed that the average inter¬ 
reference distance is the slope of the average miss 
rate, which at its turn is the slope of the average 
working-set size, both taken as functions of the win¬ 
dow size. The result is of particular interest as it 


provides a straightforward link between the proper¬ 
ties of the reference string and the performance of a 
cache that uses the least recently used (LRU) eviction 
policy but whose size varies. To understand the latter 
consider that the size of the working-set for a given 
window depends on the number of unique destina¬ 
tions within the window, which may vary. Still, un¬ 
der the condition that the reference string is obtained 
with IRM, the working-set size will be normally dis¬ 
tributed with a low variance. We can approximate it 
as being constant and as a result the cache modeled 
by the working-set becomes an LRU of fixed size. 

We leverage in what follows the result above to 
deduce miss rate of an LRU cache when fed by a ref¬ 
erence string obtained using IRM and a GZipf popu¬ 
larity distribution. The miss rate for the upper part 
of f(t) is: 

= ^dt = -C— (7) 

where, t < 1/vk, 1 < q < 2 and C is a normalizing 

N-l 

constant which ensures that ^ Cf{t) = 1. We can 

t=i 

further compute the average working-set size as: 

, 8 , 

To obtain the miss rate as a function of the cache 
size, not of the inter-reference distance, we take the 
inverse of Sq and replace it in 0- For s < Sq{l/iyk) 
we get: 

1 1 q-2 q-2 

mqis) = G9-l(2-(z) 9-l(g-l)9-lsg-l 

1 

oc s 9 ~ 1 (9) 

This suggests that the asymptotic miss rate as a 
function of cache size is a power law of the cache size 
with an exponent dependent on the slope of the popu¬ 
larity distribution. Similarly, for large inter-reference 
distances, when s > Sr{l/vk)- 
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to N and consequently that the number of packets 
does not influence s{m). 


1 -- 

mr{s) oc s r — 1 (iQ) 

Then, for a reference string whose destinations 
have a GZipf popularity distribution and where the 
references to objects are independent, we hnd that 
the miss rate presents two power-law regimes with 
exponents only dependent on the exponents of the 
popularity distribution and the cache size. We test 
the ability of the equations to fit empirical observa¬ 
tions in 14.41 


3.4 Cache Performance Analysis 

We now investigate how cache size varies with respect 
to the parameters of the model if the miss rate is held 
constant. By inverting ([^ and (10) we obtain the 
cache size as a function of the miss rate: 
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Vk = (1 and 0 < m < 1. 

\X-gJ 

We see that s{m) is independent of both the num¬ 
ber of packets N and the number of destinations 
D and is sensible only to changes of the slopes of 
the popularity distribution q, r and the frequency 
at which the two slopes intersect, Vk- We do note 
that C does depend analytically on N as it can 
be seen by considering C’s defining expression (see 
discussion of ([^): 1/C = — q) — C(3 — 

n 

r, N) + C(3 — r, 1/vk) where H{n, m) = 1/^"* is 

k=l 

the generalized harmonic number of order n of m and 

OO 

C(s, a) = l/(^ + a)* Ihs Hurwitz Zeta function. 

k—0 

However, the first and last terms of the expression 
depend only on popularity parameters while the mid¬ 
dle one quickly converges to a constant as N grows. 
Whereby it is safe to assume C constant with respect 
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Figure 2: Cache size as a function of a GZipf expo¬ 
nent for a fixed miss rate 


On the other hand, if the parameters of the popu¬ 
larity distribution are modified, some interesting de¬ 
pendencies can be uncovered. For brevity, we explore 
only the case when q and r vary but still respect the 
constraint that 1 < r < g < 2. When both exponents 
jointly change, the cache size required to maintain 
the miss rate will qualitatively vary as depicted in 
Fig.i Specifically, as their value approach 1, that is, 
when the popularity distribution is strongly skewed, 
cache size asymptotically goes to a low value con¬ 
stant, whereas when the exponent approaches 2, the 
required cache size grows very fast, notice the super- 
linear growth in the log-log scale. Despite not being 
indicated by (11), s{m) is dehned when g or r are 
2, that is, it does not grow unbounded. The expres¬ 
sion can be obtained if we replace g by 2 in ([^ and 
recompute all equations: 


m 

s(m) = (C + m)e C (12) 

3.5 Discussion of Asymptotic Cache 
Performance and Impact 

Using the results of the analysis performed in the 
previous section we are now interested to character¬ 
ize the asymptotic scalability of the LISP cache size 
with respect to (i) the number of users in a LISP site 
(ii) the size of the EID space and (hi) the parameters 
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of the popularity distribution. To simplify the dis¬ 
cussion, we assume there are no interactions between 
the first two and the third: 

Assumption 3. The destination prefix popularity 
distribution is independent of the number of users in 
a LISP site and the size of the EID space. 

Whereby (i) contemplates the variation of the num¬ 
ber of packets, N (ii) the variation of the number of 
destinations D and (iii) the variation of the GZipf 
parameters q, r, p and A, independently. We ac¬ 
knowledge that the popularity distribution may be 
influenced by a multitude of factors, and in particular 
by the growth of the users generating the reference 
string. Nonetheless, we argue that our assumption 
does make practical sense. For instance, a typical 
LISP router is expected to serve hundreds to thou¬ 
sands of clients so fluctuations proportional to the 
size of the user set should not affect overall homo¬ 
geneity and popularity distribution. Additionally, al¬ 
though user interest in content quickly changes, the 
same is not necessarily true for the content sources, 
i.e., prefixes from where the content is served, which 
the user cannot typically select. This split between 
content and its location can result in relatively sta¬ 
ble popularity distribution of the prefixes despite the 
dynamic popularity of actual content. We show an 
example network where this assumption holds in Sec¬ 
tion 221 

In the previous section we found that when the 
parameters of the popularity distribution are held 
constant, the cache size is independent of both the 
number of packets and destinations. As a result, 
cache size scales constantly, 0(1), with the number 
of users within a LISP site and the size of EID-prefix 
space for a fixed miss rate. This observation has sev¬ 
eral fundamental implications for LISP’s deployment. 
First, caches for LISP networks can be designed and 
deployed for a desired performance level which sub¬ 
sequently does not degrade with the growth of the 
site and the growth of the Internet address space. 
Second, splitting traffic between multiple caches (i.e., 
routers) for operational purposes, within a large LISP 
site, does not affect cache performance. Finally, sig¬ 
naling, i.e., the number of Map-Request exchanges, 
grows linearly with the number of users if no hier¬ 


archies or cascades of caches are used. This because 
the number of resolution requests is m{s) N. 

If the previous assumption does not hold, then, 
in the worst case, the cache size scales linearly with 
\D\. This follows if we consider that, as the growth 
of N and D flatten the distribution, thus leading to a 
uniform popularity, the cache size for a desired miss 
rate becomes proportional to the \D\. 

4 Empirical Evidence of Tem¬ 
poral Locality 

In this section we verify the accuracy of our assump¬ 
tions regarding the popularity distribution of desti¬ 
nation prefixes and the sources of locality in network 
traffic. We also verify the accuracy of the predic¬ 
tions regarding the performance of the LISP cache 
empirically. But first, we present our datasets and 
experimental methodology. 

4.1 Packet Traces and Cache Emula¬ 
tor 

We use four one-day packet traces that only consist 
of egress traffic for our experiments. Three were cap¬ 
tured at the 2Gbps link that connects our Univer¬ 
sity’s campus network to the Gatalan Research Net¬ 
work (GESGA) and span a period of 3.5 years, from 
2009 to 2012. The fourth was captured at the lOGbps 
link connecting GESCA to the Spanish academic net¬ 
work (Rediris) in 2013. UPG campus has about 36k 
users consisting generally of students, academic staff 
and auxiliary personnel while GESCA provides tran¬ 
sit services for 89 institutions that include the public 
Catalan schools, hospitals and universities. The im¬ 
portant properties of the datasets are summarized in 
Table m 

At the time of this writing there exists no policy 
as to how EID-prefixes are to be allocated. How¬ 
ever, it is expected and also the practice today in the 
LISP-beta network to allocate EIDs in IP-prefix-like 
blocks. Consequently we performed our analysis con¬ 
sidering EID-prefixes to be of BGP-prefix granularity. 
For each packet within a trace we find the associated 
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Table 1: Datasets Statistics 

upc 2009 upc 2011 upc 2012 cesca 2013 


Date 

2009-05-26 

2011-10-19 

2012-11-21 

2013-01-24 

Packets 

6.5B 

4.05B 

5.57B 

20B 

Av. pkt/s 

75.3k 

46.9k 

64.4k 

232k 

Prefixes 

92.8k 

94.9k 

109.4k 

143.7k 

Av. pref/s 

2.3k 

1.95k 

2.1k 

2.56k 


Table 2: Routing Tables Statistics 



upc 2009 

upc 2011 

upc 2012 

cesca 2013 

BGP AT 

288k 

400k 

450k 

455k 

BPG,^ 

142k 

170k 

213k 

216k 

P 

0.65 

0.55 

0.51 

0.66 


prefix using BGP routing tables downloaded form the 
RouteView archive [21] that match the trace’s cap¬ 
ture date. We filtered out the more specific prefixes 
from the routing tables as they are generally used 
for traffic engieering and LISP offers a more efficient 
management of these operational needs. Tablej^gives 
an overview of the original (BGPijT)) and filtered BGP^ 
routing table sizes as well as the ratio (p) between 
the filtered routing table size and the the number 
of prefixes observed within each trace. Both UPC 
and CESCA visit daily more than half of the prefixes 
within BGP^. 

Apart from the popularity and temporal locality 
analysis we also implemented an LISP ITR emulator 
to estimate LRU map-cache performance using the 
traces and the routing tables as input. We compare 
the predictions of our cache model with the empirical 
results in 14.41 

4.2 Popularity Distribution 

Figure [^presents the frequency-rank distributions of 
our datasets for both absolute and normalized fre¬ 
quency. A few observations are in place. First, 
although clearly not accurately described by Zipf’s 


law, they also slightly deviate from a CZipf. Namely, 
the head of the distribution presents two power-law 
regiemes followed by a third that describes the tail as 
it can be seen in Fig. (down). This may be either 
because a one day sample is not enough to obtain 
accurate statistics in the Zipf-Mandelbrot head rea- 
gion, or because popularity for low ranks follows a 
more complex law. Still, we find that for all traces 
the frequencies of higher ranks (above 2000) are accu¬ 
rately characterized by two power-law regiemes (see 
Fig.|t]). 

Secondly, the frequency-rank curves for the UPC 
datasets are remarkably similar. Despite the 50% in¬ 
crease of BGP,^ (i.e., D), changes in the Internet con¬ 
tent provider infrastructure over a 3.5 years period, 
and perhaps even changes in the local user set, the 
popularity distributions are roughly the same. 

Finally, the normalized frequency plots for all 
traces are similar, in spite of the large difference 
in number of packets between CESCA and UPC 
datasets. These observations confirm our assumption 
that growth of the number of users within the site or 
of the destination space do not necessarily result in 
a change of the popularity distribution. 

To confirm that these results are not due to a bias 
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4.3 Prefix Inter-Reference Distance 
Distribution 

We now check if knowledge about the popularity dis¬ 
tribution suffices to accurately characterize the inter¬ 
reference distance distribution or if short-term corre¬ 
lations must also be taken into account. To achieve 
this, we use a methodology similar to the one used 
in m for Web page traffic. We first generate random 
versions of our traces according to the IRM model, 
i.e., by considering only the popularity distribution 
and geometric inter-reference times, and then com¬ 
pare the resulting inter-reference distance distribu¬ 
tions to the originals. Results are shown in Fig. 
We find that for all traces, popularity alone is able to 
account for the greater part of the inter-reference dis¬ 
tance distribution, like in the case of Web requests. 
The only disagreement is in the region with distances 
lower than 100 where short-term correlations are im¬ 
portant and IRM traces underestimate the probabil¬ 
ity by a significant margin. 

A rather interesting finding is that the short-term 
correlations in all traces are such that the power-law 
behavior observed for higher distances (t > 100) is 
extended up to distance 1. In this region, the exact 
inter-reference distance equation Q is a poor fit to 
reality as it follows the IRM curve. However, the 
empirical results are apropriately described by our 
approximate inter-reference model ([^ which avoids 
IRM’s bent by assuming 0 ’s numerator constant. 


Figure 3: Destination Prefix Popularity 


4.4 Cache Performance 


of popularity for larger prefixes sizes, that is, larger 
prefixes are more probable to receive larger volumes 
of traffic because they contain more hosts, we checked 
the correlation between prefix length and frequency. 
But (not shown here) we didn’t find any evidence in 
support of this. 


Having found that our assumptions regarding net¬ 
work traffic properties hold in our datasests we now 
if the cache model (see 0 and ( [l0| ) is able predict 
real world LRU cache performance. 

As mentioned in Section |4.2| and as it may be seen 
in Fig. the head of the popularity distribution ex¬ 
hibits two power-law regiemes instead of one. Then, 
two options arise, we can either use the model disre¬ 
garding the discrepancies or adapt it to consider the 
low rank region behavior. For completness, we choose 
the latter in our evaluation. This only consists in ap¬ 
proximating Pu{v) (see 0 ) as having three regions, 
each dominated by an exponent Recomputing 
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Figure 4: Empirical and IRM generated inter¬ 
reference for the four traces 


(101 we get that the miss rate has three regions, each 
characterized by an ai. Choosing the first option 
would only result in an overestimation of cache miss 
rates for low cache sizes. 

To contrast the model with the empirical observa¬ 
tions, we performed a linear least squares fit of the 
three regions of the popularity distribution. This al¬ 
lowed us to determine the exponents Oi, computed as 
1 -|- 1/si where Si is the slope of the ith segment, and 
to roughly approximate the frequencies Uki and Vk 2 
at which the segments intersect. Using them as in¬ 
put to (9) we get a cache miss rate estimate as shown 
in Fig. 7 Generally, we see that the model is a re¬ 
markably good fit for the large cache sizes but con¬ 
stantly underestimates the miss rate for sizes lower 


Figure 5: Frequency-rank distribution of destination 
prefixes and a linear least squares fit of the three 
power-law regiemes. = \ -\-\/si, where Si is the 
slope of the ith segment. 


than 1000. This may be due to the poor fit of the pop¬ 
ularity for low ranks. Nevertheless a more elaborate 
fitting of Vki and V }:2 should provide better results as 
it may be seen in Fig. where we performed a linear 
least squares fit of the three power law regions of the 
cache miss rate. Knowing that the slope of the cache 
miss rate is = 1 — l/(aj — 1) (see 0)> we computed 
the exponents as depicted in the figure. Comparison 
with those computed in Fig. shows they are very 
similar. Overall, we can conclude that the model ac¬ 
curately predicts cache performance. 
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Figure 6: Empirical miss rate with cache size and a 
linear least-squares fit of the exponent for the three 
power-law regions. Notice the similarity with the ex¬ 
ponents of the three regions of the popularity distri¬ 
bution in Fig[^ 

5 Related Work 

Denning was first to recognize the phenomenon of 
temporal locality in his definition of the working- 
set [5] and together with Schwartz established the 
fundamental properties that characterize it [7]. Al¬ 
though initially designed for the analysis of page 
caching in operating systems, the ideas were later 
reused in other fields including Web page and route 
caching. 

In [2] Breslau et al. argued that empirical evidence 
indicates that Web requests popularity distribution 


Figure 7: Empirical miss rate with cache size together 
with a fit by ([^ and (101 


is Zipf-like of exponent a < 1. Using this finding 
and the assumption that temporal locality is mainly 
induced through long-term popularity, they showed 
that the asymptotic miss rates of an LEU cache, as a 
function of the cache size, is a power law of exponent 
1 — a. In this paper we argue that GZipf with expo¬ 
nents greater than 1 is a closer fit to real popularity 
distributions and obtain a more general LRU cache 
model. We further use the model to determine the 
scaling properties of the cache. 

Jin and Bestavros showed in m that the inter¬ 
reference distribution is mainly determined by the 
the long-term popularity and only marginally by 
short-term correlations. They also proved that the 
inter-reference distribution of a reference string with 
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Zipf-like popularity distribution is proportional to 
We build upon their work but also extend 
their results by both considering a GZipf popularity 
distribution and by using them to deduce an LRU 
cache model. 

In the field of route caching, Feldmeier [^j and 
Jain [12] were among the first to evaluate the pos¬ 
sibility of performing destination address caching by 
leveraging the locality of traffic in network environ¬ 
ments. Feldmeier found that locality could be ex¬ 
ploited to reduce routing table lookup times on a 
gateway router while Jain, discovered that determin¬ 
istic protocol behavior limits the benefits of locality 
for small caches. The works, though fundamental, 
bear no practical relevance today as they were car¬ 
ried two decades ago, a time when the Internet was 
still in its infancy. 

Recently, Kim et al. |16] performed a measurement 
study within the operational confinement of an ISP’s 
network and showed the feasibility of route caching. 
They show by means of an experimental evaluation 
that LRU cache eviction policy performs close to op¬ 
timal and better than LFU. Also, they found that 
prefix popularity distribution is very skewed and that 
working-set size is generally stable with time. These 
are in line with our empirical findings and provide 
practical confirmation for our assumption that the 
popularity distribution can be described as a GZipf. 

Several works have previously looked at cache per¬ 
formance in loc/id split scenarios considering LISP as 
a reference implementation. lannone et al. m per¬ 
formed an initial trace driven study of the LISP map- 
cache performance while Kim et al. m have both ex¬ 
tended and confirmed the previous results with the 
help of a larger, ISP trace. Zhang et al. [26| per¬ 
formed a trace based Loc/ID mapping cache perfor¬ 
mance analysis assuming a LRU eviction policy and 
using traffic captured at two egressing links of the 
China Education and Research Network backbone 
network. Although methodologies differ between the 
different papers, in all cases the observed LISP cache 
miss rates were found to be relatively small. This, 
again, indirectly confirms the skewness of the popu¬ 
larity distribution and its stability at least for short 
time scales. 

Finally, in |4] we devised an analytical model for 


the LISP cache size starting from empirical aver¬ 
age working-set curves, using the working-set theory. 
Our goal was to model the influence of locality on 
cache miss rates whereas here, we look to understand 
how cache performance scales with respect to defin¬ 
ing parameters, that is, the popularity distribution, 
the size of the LISP site and the size of the EID space, 
of network traffic. 


6 Conclusions 

LISP offers a viable solution to scaling the core rout¬ 
ing infrastructure of the Internet by means of a lo¬ 
cation/identity split. However this forces edge do¬ 
main routers to cache location to identity bindings 
for timely operations. In this paper we answer the 
following question: does the newly introduced LISP 
edge cache scale? 

Our findings show that the miss rate scales con¬ 
stantly 0(1) with the number of users as well as 
with the number of destinations. For this, we start 
from two assumptions: (i) the popularity of destina¬ 
tion prefixes is described by a GZipf distribution and 
(ii) temporal locality is predominantly determined 
by long-term popularity. Fundamentally, these as¬ 
sumptions are often observed to hold in the Inter¬ 
net mnn but also in other fields such as web traf¬ 
fic |2], on-demand video |S] or even linguistics m- 
Arguably, they are inherent to human nature and, as 
such, are expected to hold in the foreseeable future. 
Nevertheless, in the paper we also show that if the 
converse holds, then cache size scales linearly 0(N) 
with the number of destinations. 

At the time of this writing there is an open debate 
on how the Internet should look like in the near fu¬ 
ture and in this context, it is important to analyze 
the scalability of the various future Internet architec¬ 
ture proposals. This paper fills this gap, particularly 
for the Locator/ID split architecture. Furthermore, 
our results show that edge networks willing to deploy 
LISP will not face scalability issues -as long as both 
assumptions hold- in the size of their map-cache, even 
if the edge network itself becomes larger (i.e., more 
users) or the Internet grows (i.e., more prefixes). 
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